Technologies for efficient lz77-based data decompression

ABSTRACT

Technologies for data decompression include a computing device that reads a symbol tag byte from an input stream. The computing device determines whether the symbol can be decoded using a fast-path routine, and if not, executes a slow-path routine to decompress the symbol. The slow-path routine may include data-dependent branch instructions that may be unpredictable using branch prediction hardware. For the fast-path routine, the computing device determines a next symbol increment value, a literal increment value, a data length, and an offset based on the tag byte, without executing an unpredictable branch instruction. The computing device sets a source pointer to either literal data or reference data as a function of the tag byte, without executing an unpredictable branch instruction. The computing device may set the source pointer using a conditional move instruction. The computing device copies the data and processes remaining symbols. Other embodiments are described and claimed.

CROSS-REFERENCE TO RELATED APPLICATION

The present application is a continuation application of U.S.application Ser. No. 14/494,766, entitled “TECHNOLOGIES FOR EFFICIENTLZ77-BASED DATA DECOMPRESSION,” which was filed on Sep. 24, 2014.

BACKGROUND

Software data decompression is an important software operation used inmany computing applications, including both server and clientapplications. Many common lossless compression formats are based on theLZ77 compression algorithm Data compressed using LZ77-based algorithmstypically include a stream of symbols. Each symbol may include literaldata that is to be copied to the output or a reference to repeat datathat has already been decompressed. Compared to other losslesscompression algorithms such as DEFLATE, LZ77-based algorithms typicallyachieve lower compression levels but provider higher performance,particularly for decompression. One typical LZ77-based format is“Snappy,” developed by Google Inc. and used by the Apache HadoopTMproject and others. Other LZ77-based formats include LZO and LZF.

Typical implementations of decompression algorithms include numerousconditional branches used to categorize input symbols. The outcome ofthose conditional branches is dependent on the input data. Branchprediction hardware for typical processors may have difficulty correctlypredicting the outcome of those conditional branches. Branchmisprediction penalties may reduce achievable decompression performance

BRIEF DESCRIPTION OF THE DRAWINGS

The concepts described herein are illustrated by way of example and notby way of limitation in the accompanying figures. For simplicity andclarity of illustration, elements illustrated in the figures are notnecessarily drawn to scale. Where considered appropriate, referencelabels have been repeated among the figures to indicate corresponding oranalogous elements.

FIG. 1 is a simplified block diagram of at least one embodiment of acomputing device for efficient data decompression;

FIG. 2 is a simplified block diagram of at least one embodiment of anenvironment of the computing device of FIG. 1;

FIGS. 3A and 3B are a simplified flow diagram of at least one embodimentof a method for efficient data decompression that may be executed by thecomputing device of FIGS. 1 and 2;

FIG. 4 is a schematic diagram of compressed data symbols that may bedecompressed by the method of FIGS. 3A and 3B;

FIG. 5 is a pseudocode listing of at least one embodiment of the methodof FIGS. 3A and 3B; and

FIG. 6 is a pseudocode listing of another embodiment of part of themethod of FIGS. 3A and 3B.

DETAILED DESCRIPTION OF THE DRAWINGS

While the concepts of the present disclosure are susceptible to variousmodifications and alternative forms, specific embodiments thereof havebeen shown by way of example in the drawings and will be describedherein in detail. It should be understood, however, that there is nointent to limit the concepts of the present disclosure to the particularforms disclosed, but on the contrary, the intention is to cover allmodifications, equivalents, and alternatives consistent with the presentdisclosure and the appended claims.

References in the specification to “one embodiment,” “an embodiment,”“an illustrative embodiment,” etc., indicate that the embodimentdescribed may include a particular feature, structure, orcharacteristic, but every embodiment may or may not necessarily includethat particular feature, structure, or characteristic. Moreover, suchphrases are not necessarily referring to the same embodiment. Further,when a particular feature, structure, or characteristic is described inconnection with an embodiment, it is submitted that it is within theknowledge of one skilled in the art to effect such feature, structure,or characteristic in connection with other embodiments whether or notexplicitly described. Additionally, it should be appreciated that itemsincluded in a list in the form of “at least one A, B, and C” can mean(A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C).Similarly, items listed in the form of “at least one of A, B, or C” canmean (A); (B); (C); (A and B); (A and C); (B and C); or (A, B, and C).

The disclosed embodiments may be implemented, in some cases, inhardware, firmware, software, or any combination thereof. The disclosedembodiments may also be implemented as instructions carried by or storedon a transitory or non-transitory machine-readable (e.g.,computer-readable) storage medium, which may be read and executed by oneor more processors. A machine-readable storage medium may be embodied asany storage device, mechanism, or other physical structure for storingor transmitting information in a form readable by a machine (e.g., avolatile or non-volatile memory, a media disc, or other media device).

In the drawings, some structural or method features may be shown inspecific arrangements and/or orderings. However, it should beappreciated that such specific arrangements and/or orderings may not berequired. Rather, in some embodiments, such features may be arranged ina different manner and/or order than shown in the illustrative figures.Additionally, the inclusion of a structural or method feature in aparticular figure is not meant to imply that such feature is required inall embodiments and, in some embodiments, may not be included or may becombined with other features.

Referring now to FIG. 1, an illustrative computing device 100 forefficient data decompression includes a processor 120, an I/O subsystem122, a memory 124, and a data storage device 126. In use, as describedbelow, the computing device 100 is configured to read symbols fromcompressed input data, decode the symbols, and output decompressed data.The computing device 100 determines whether each input symbol may bedecompressed using a fast-path decompression routine or using atraditional slow-path decompression routine. Most symbols may bedecompressed using the fast-path routine and thus the determination ofwhether to apply the fast-path routine may be predicted with highaccuracy by branch prediction hardware of the processor 120. Duringexecution of the fast-path routine, the computing device 100 determinesthe starting address and length of data to be copied without performingdata-dependent branch instructions that may be unpredictable usingbranch prediction hardware. The computing device 100 may use small datatables or other fast techniques to determine the location of the nextsymbol in the input stream, again without performing data-dependentbranch instructions. By avoiding unpredictable branch instructions, thecomputing device 100 may avoid branch misprediction penalties. Byreducing the execution time of a critical instruction path to determinethe next symbol location, the computing device 100 may improvethroughput, especially when using a processor 120 capable ofout-of-order execution. Illustratively, a computing device 100 asdescribed herein may achieve decompression speeds of up to 50-100%faster than previously known, optimized software decompressiontechniques.

The computing device 100 may be embodied as any type of device capableof efficient data decompression and otherwise performing the functionsdescribed herein. For example, the computing device 100 may be embodiedas, without limitation, a laptop computer, a notebook computer, a tabletcomputer, a smartphone, a mobile computing device, a wearable computingdevice, a computer, a desktop computer, a workstation, a servercomputer, a distributed computing system, a multiprocessor system, aconsumer electronic device, a smart appliance, and/or any othercomputing device capable of efficient data decompression. As shown inFIG. 1, the illustrative computing device 100 includes the processor120, the I/O subsystem 122, the memory 124, and the data storage device126. Of course, the computing device 100 may include other or additionalcomponents, such as those commonly found in a computer (e.g., variousinput/output devices), in other embodiments. Additionally, in someembodiments, one or more of the illustrative components may beincorporated in, or otherwise form a portion of, another component. Forexample, the memory 124, or portions thereof, may be incorporated in theprocessor 120 in some embodiments.

The processor 120 may be embodied as any type of processor capable ofperforming the functions described herein. For example, the processor120 may be embodied as a single or multi-core processor(s), digitalsignal processor, microcontroller, or other processor orprocessing/controlling circuit. Similarly, the memory 124 may beembodied as any type of volatile or non-volatile memory or data storagecapable of performing the functions described herein. In operation, thememory 124 may store various data and software used during operation ofthe computing device 100 such operating systems, applications, programs,libraries, and drivers. The memory 124 is communicatively coupled to theprocessor 120 via the I/O subsystem 122, which may be embodied ascircuitry and/or components to facilitate input/output operations withthe processor 120, the memory 124, and other components of the computingdevice 100. For example, the I/O subsystem 122 may be embodied as, orotherwise include, memory controller hubs, input/output control hubs,firmware devices, communication links (i.e., point-to-point links, buslinks, wires, cables, light guides, printed circuit board traces, etc.)and/or other components and subsystems to facilitate the input/outputoperations. In some embodiments, the I/O subsystem 122 may form aportion of a system-on-a-chip (SoC) and be incorporated, along with theprocessor 120, the memory 124, and other components of the computingdevice 100, on a single integrated circuit chip.

The data storage device 126 may be embodied as any type of device ordevices configured for short-term or long-term storage of data such as,for example, memory devices and circuits, memory cards, hard diskdrives, solid-state drives, or other data storage devices. The datastorage device 126 may store compressed and/or decompressed dataprocessed by the computing device 100.

The computing device 100 may also include a communication subsystem 128,which may be embodied as any communication circuit, device, orcollection thereof, capable of enabling communications between thecomputing device 100 and other remote devices over a computer network(not shown). The communication subsystem 128 may be configured to useany one or more communication technology (e.g., wired or wirelesscommunications) and associated protocols (e.g., Ethernet, Bluetooth®,Wi-Fi®, WiMAX, etc.) to effect such communication.

Referring now to FIG. 2, in the illustrative embodiment, the computingdevice 100 establishes an environment 200 during operation. Theillustrative embodiment 200 includes an input module 202, a symbol tagdecoding module 204, a data source module 206, an output module 208, anda slow-path module 210. In use, the computing device 100 is configuredto read compressed data from an input stream 212, decode the compresseddata, and write decompressed data to an output stream 214. The variousmodules of the environment 200 may be embodied as hardware, firmware,software, or a combination thereof. For example the various modules,logic, and other components of the environment 200 may form a portionof, or otherwise be established by, the processor 120 or other hardwarecomponents of the computing device 100.

The input module 202 is configured to manage access to the input stream212. The input module 202 is configured to open the input stream 212 andread symbols and other data from the input stream 212. The input module202 may maintain an input stream pointer that may be used to accesscompressed data from the input stream 212. The input stream 212 may beembodied as any in-memory data structure including compressed data. Theinput stream 212 may be backed by or otherwise associated with a file,network connection, memory buffer, or any other source of compresseddata.

The symbol tag decoding module 204 is configured to determine a nextsymbol increment value, a literal increment value, a data length, and anoffset value based on a symbol tag value read by the input module 202.As described above, the compressed data includes a series of symbols.Each symbol may be a literal symbol or a reference symbol, and eachsymbol may occupy a variable number of bytes in the input stream 212.Literal symbols include literal data that is to be copied to the outputstream 214 during decompression. Reference symbols refer to previouslydecompressed data that is to be copied to the output stream 214 duringdecompression, for example by copying data that was previously copied tothe output stream 214. The next symbol increment value may be used toidentify the location in the input stream 212 of the beginning of thenext symbol. The data length indicates how many bytes of data are to becopied, for both literal symbols and reference symbols. The literalincrement value may be used to locate the start of literal data in theinput stream 212, and the offset value may be used to locate the startof reference data in the output stream 214. In some embodiments, thesymbol tag decoding module 204 may be configured to determine the nextsymbol increment value, the literal increment, and the data length byindexing a next symbol increment table 216, a literal increment table218, and a length table 220, respectively. Additionally oralternatively, in some embodiments the symbol tag decoding module 204may be configured to determine those values programmatically.

The data source module 206 is configured to maintain a source pointerused to access data to be copied to the decompressed output. The datasource module 206 is configured to set the source pointer to point toliteral data from the input stream 212 or to reference previouslydecompressed data from the output stream 214 based on the type of thecurrent symbol (i.e., a literal symbol or a reference symbol). The datasource module 206 is configured to set the source pointer withoutexecuting any unpredictable branch instructions, for example by usingconditional move instructions.

The output module 208 is configured to manage access to the outputstream 214. The output module 208 is configured to open the outputstream 214 and write decompressed data to the output stream 214. Theoutput module 208 is also configured to allow copying already-writtendata from the output stream 214, used to decompress reference symbols.The output module 208 may maintain an output stream pointer that may beused to read and/or write decompressed data from the output stream 214.The output stream 214 may be embodied as any in-memory data structurecapable of storing and referencing decompressed data. The output stream214 may be backed by or otherwise associated with a file, networkconnection, memory buffer, or any other destination for decompresseddata.

The slow-path module 210 is configured to execute a slow-pathdecompression routine to decode symbols that cannot be decoded by thefast path (i.e., by the input module 202, the symbol tag decoding module204, the data source module 206, and the output module 208). Theslow-path decompression routine may be embodied as any optimized orunoptimized decompression algorithm capable of decoding the symbols thatcannot be decoded by the fast path such as, for example, the Snappydecompression library developed by Google Inc. The slow-pathdecompression routine may include unpredictable branch instructionsbased on the class of the current symbol, for example.

Referring now to FIG. 3A, in use, the computing device 100 may execute amethod 300 for efficient data decompression. The method 300 begins inblock 302, in which the computing device 100 opens the input stream 212and sets an input stream pointer to the first symbol included in thestream. The computing device 100 may use any technique for opening theinput stream 212, such as copying the contents of the file to a memorybuffer or memory-mapping a file. For many file formats, the input stream212 may include a header or other data before the first symbol. Forexample, the Snappy compressed file format includes a preamble storingthe uncompressed length of the file as a little-endian varint. Thecomputing device 100 may perform any appropriate operations to processthe header and increment the input stream pointer to the first symbol.

In block 304, the computing device 100 opens the output stream 214 andinitializes an output stream pointer. The computing device 100 may useany technique for opening the output stream 214, such as creating anin-memory write buffer or memory-mapping a file. After being opened, thecomputing device 100 is capable of writing data to the output stream 214starting at the output stream pointer. The computing device 100 is alsocapable of reading and/or copying data from the output stream 214, whichis used to decompress reference symbols.

In block 306, the computing device 100 reads a tag value for the currentinput symbol from the input stream pointer. In the illustrativeembodiment, which decompresses files stored in the Snappy format, thecomputing device 100 reads a single tag byte from the input stream 212.However, in other embodiments, the computing device 100 may readadditional tag data. As described above, the input stream 212 includes asequence of symbols, including literal symbols and reference symbols.The tag value may be used by the computing device 100 to determine thetype of the current symbol, and may also be used to determine parametersof the symbol such as offset and data length.

Referring now to FIG. 4, a diagram 400 illustrates various symbols usedin the Snappy format. Each symbol begins with a tag byte and includes atwo-bit class value in the least-significant two bits of the tag bytethat may be used to categorize the symbols. Symbols 402, 404 are literalsymbols, with a class value equal to the binary value “00.” The symbol402 includes a tag byte followed by sequence of literal data bytes. Themost-significant six bits of the tag byte represent the length of theliteral minus one. The symbol 402 may represent literals of lengths from1 to 60 bytes, inclusive. Thus, the symbol 402 may be used forrelatively short literals.

The symbol 404 may be used for longer literals. The symbol 404 includesa tag byte followed by a length value, and then followed by a sequenceof literal data bytes. The most-significant six bits of the tag byte arecoded to indicate how many bytes are used to store the length value.Binary values corresponding to 60, 61, 62, or 63 correspond to lengthsof 1-4 bytes, respectively. The length value is equal to the length ofthe literal minus one, stored in little-endian format.

Symbols 406, 408, 410 are reference symbols. The symbol 406 is aone-byte offset reference symbol with a class value equal to the binaryvalue “01.” The symbol 406 includes an 11-bit offset value. The threemost-significant bits of the offset (o₁₀ through o₈) are stored in thethree most-significant bits of the tag byte, and the rest of the offset(bits o₇ through o₀) is stored in the next byte following the tag byte.The tag byte also includes a three-bit length value positioned betweenthe offset bits and the class bits (bits numbered 4 to 2 of the tagbyte, labeled l₂ through l₀). The length value represents the datalength minus four; therefore, the symbol 406 may store lengths from fourto 11, inclusive, and offsets from zero to 2047, inclusive.

The symbol 408 is a two-byte offset reference symbol with a class valueequal to the binary value “10.” The symbol 408 includes a two-byteoffset value following the tag byte, stored as a little-endian 16-bitinteger. In other words, the first byte of the offset stores the eightleast-significant bits and the second byte of the offset stores theeight most-significant bits. The length value is stored in themost-significant six bits of the tag byte, and represents the length ofdata to be copied minus one. The symbol 408 may store lengths from oneto 64, inclusive, and offsets from zero to 65,535, inclusive.

The symbol 408 is a four-byte offset reference symbol with a class valueequal to the binary value “11.” The symbol 408 includes a four-byteoffset value following the tag byte, stored as a little-endian 32-bitinteger. The length value is stored in the most-significant six bits ofthe tag byte, and equals the length of data to be copied minus one.Four-byte offset reference symbols may be very rare in compressed datain use, because generation of such symbols would require compressors tomaintain a large amount of history.

Referring back to FIG. 3A, after reading the tag value, in block 308 thecomputing device 100 determines whether the tag value is greater than amaximum data length. If the tag value is greater than the maximum datalength, then the computing device 100 may not be capable of a fast-pathdecompression of the symbol. In many embodiments, the maximum datalength may be 60 bytes, because as described above, if the upper sixbits of the tag value are greater than or equal to 60 (because the uppersix bits include the length minus one), then the symbol may be a largeliteral symbol, such as the symbol 404 of FIG. 4. In those embodiments,the computing device 100 may determine whether the upper six bits of thetag byte are greater than or equal to 60, which may be equivalent todetermining whether the tag value is greater than or equal to 60 timesfour. If the computing device 100 determines that the tag value isgreater than the maximum data length, the method 300 branches to block310. If not, the computing device 100 advances to block 312, describedbelow.

In block 310, the computing device 100 decompresses the current inputsymbol using a slow-path routine. The slow-path routine may perform anordinary, optimized or unoptimized decompression algorithm. Inparticular, when executing the slow-path routine, the computing device100 may perform unpredictable branch instructions or other potentiallyslow instructions to perform correct decompression of the input symbol.Because the slow-path is rarely taken, overall performance of the method300 may not be adversely impacted by branch misprediction penalties orother performance issues in the slow-path. When executing the slow-pathroutine, the computing device 100 may not make assumptions about thetype of the current input symbol and thus may check for all potentialformats. For example, although the tag byte may have its upper six bitsgreater than or equal to 60, the current input symbol is not necessarilya long literal symbol. For example, the current symbol may be a two- orfour-byte reference symbol having a length greater than 60. Afterdecompressing the current symbol and updating the input stream pointerand the output stream pointer accordingly, the method 300 loops back toblock 306 to continue decompressing the next symbol.

Referring back to block 308, if the tag value is not greater than orequal to 60 times four, the method 300 advances to block 312. Becausethe tag value will only rarely be greater than or equal to 60 timesfour, in most iterations the method 300 advances to block 312.Therefore, branch prediction hardware of the processor 120 may becapable of predicting whether the method 300 advances to block 312 withhigh accuracy. In block 312, the computing device 100 sets a class valueequal to the lower two bits of the tag byte. The computing device 100may, for example, mask off the lower two bits using one or more bitwiseoperations. As shown in FIG. 4, the two-bit class value identifies fourdifferent types of symbols: literals, one-byte offset references,two-byte offset references, and four-byte offset references.

In block 314, the computing device 100 determines whether the class ofthe current symbol is a four-byte offset reference symbol. For example,the computing device 100 may determine whether the class value equalsthe binary value “11.” The computing device 100 may not be capable of afast-path decompression of four-byte offset reference symbols. If thesymbol is a four-byte offset reference symbol, the method 300 branchesto block 310 to perform the slow-path decompression routine, asdescribed above. If the symbol is not a four-byte offset referencesymbol, the method 300 advances to block 316. As described above,four-byte offset reference symbols are rare. Therefore, in mostiterations the method 300 advances to block 316, and branch predictionhardware of the processor 120 may be capable of predicting whether themethod 300 advances to block 316 with high accuracy.

In block 316, the computing device 100 looks up the increment value tothe next symbol by indexing the next symbol increment table 216 usingthe tag byte. The next symbol increment table 216 is small (e.g., nomore than 256 bytes), and may be capable of being stored in cache memoryof the processor 120. Therefore, determination of the next symbolincrement value may be completed very quickly (i.e., in the timerequired for an L1 cache hit, for example four clock cycles). Asdescribed below, the next symbol increment value may be added to theinput stream pointer to determine the position in the input stream 212of the next symbol. Thus, the determination of the next symbol incrementvalue may be on the critical path of the method 300. By calculating thenext symbol increment value quickly and without depending on anyunpredictable branches, the computing device 100 may reduce the lengthand/or latency of the critical path. In particular, in embodimentshaving a processor 120 capable of out-of-order instruction execution,the computing device 100 may be capable of continuing out-of-orderexecution for decompression of additional input symbols. Further,although illustrated as looking the next symbol increment value up fromthe next symbol increment table 216, it should be understood that insome embodiments the computing device 100 may calculate the next symbolincrement value programmatically.

In block 318, the computing device 100 looks up the increment value tothe start of a literal symbol by indexing the literal increment table218 using the tag byte. As illustrated by symbols 402, 404 of FIG. 4,for literal symbols, the increment from the address of tag byte to thestart of the literal data may vary between 1-5 bytes, depending on thelength of the literal data. Additionally, the literal increment table218 may include any value for tag bytes associated with referencesymbols, because the literal increment value will not be used for thosesymbols. Similar to the next symbol increment table 216, the literalincrement table 218 is small (e.g., no more than 256 bytes), and may becapable of being stored in cache memory of the processor 120. Further,although illustrated as looking the literal increment value up from theliteral increment table 218, it should be understood that in someembodiments the computing device 100 may calculate the literal incrementvalue programmatically. Additionally or alternatively, in someembodiments the fast-path decompression routine may only decompressrelatively short literals having an offset of one byte. In thoseembodiments, the literal increment table 218 may include all “ones,” orthe increment to the start of the literal data may be a constant valueof one and the literal increment table 218 may be omitted.

In block 320, the computing device 100 looks up the length of data to becopied by indexing the length table 220 using the tag byte. All of thesymbols that may be processed by the fast-path decompression processinclude a representation of the data length in the tag byte itself. Forexample, the symbols 402, 406, 408 of FIG. 4 all include the length inthe tag byte. The length table 220 may include any value—or novalues—for tag bytes that will not be processed by the fast-pathroutine. Similar to the tables 216, 218, the length table 220 is small(e.g., no more than 256 bytes), and may be capable of being stored incache memory of the processor 120. Further, although illustrated aslooking the data length value up from the length table 220, it should beunderstood that in some embodiments the computing device 100 maycalculate the data length value programmatically.

In block 322, the computing device 100 right-shifts the tag value byfive bits, retaining the original upper three bits. Right-shifting thetag value allows the computing device 100 to extract the upper bits ofthe offset for one-byte reference symbols. As illustrated by symbol 406of FIG. 4, right-shifting the tag byte by five bits moves the upperthree bits of the offset (0 ₁₀ through o₈) to the threeleast-significant bits of the tag byte.

In block 324, the computing device 100 sets a literal pointer to pointto the input stream pointer plus the literal increment value determinedas described above in block 318. The computing device 100 may set theliteral pointer without determining whether the current symbol is aliteral symbol or a reference symbol. As described below, the computingdevice 100 will disregard the literal pointer for reference symbols.

Referring now to FIG. 3B, the method 300 continues with block 326, inwhich the computing device 100 conditionally moves the second byte ofthe offset value to the tag value if the symbol is a two-byte offsetreference symbol, retaining the tag value if the symbol is not atwo-byte offset reference symbol. As illustrated by symbol 408 of FIG.4, for two-byte offset symbols, the second byte of the offset valueincludes the eight most-significant bits of the offset value (bits 15through 8). The computing device 100 may use any technique toconditionally set the tag value without executing a conditional branchinstruction. For example the computing device 100 may perform theconditional move using a tertiary operator, predicated instruction,conditional move instruction, or other processor instruction. In theillustrative embodiment, the computing device 100 tests whether theclass value matches the binary value “10” and then executes aconditional move (CMOV) instruction based on the results of the test.Performing a conditional move instruction allows the computing device100 to select the proper value without executing any unpredictablebranch instructions and incurring associated branch mispredictionpenalties.

In block 328, the computing device 100 sets an offset value to theconcatenation of the tag value and the first byte of the offset. Asdescribed above, the tag value includes either the threemost-significant bits of the offset value, as described in connectionwith block 322 above, or the eight most-significant bits of the offsetvalue, as described in connection with block 326 above. Thus, afterconcatenation the offset value is correct for both one-byte and two-byteoffset symbols. Additionally, as described above, the offset value mayhave been calculated without executing any unpredictable branchinstructions.

In block 330, the computing device 100 conditionally sets a sourcepointer to the literal pointer if the symbol has a literal class and tothe difference between the output stream pointer and the offset value ifthe symbol does not have a literal class value. The computing device 100may use any technique to conditionally set the source pointer. Forexample the computing device 100 may perform the conditional move usinga tertiary operator, predicated instruction, conditional moveinstruction, or other processor instruction. In the illustrativeembodiment, the computing device 100 tests whether the class value isthe binary value “00,” and then executes a conditional move (CMOV)instruction to move either the literal pointer or the difference betweenthe output stream pointer and the offset value based on the results ofthe test. Performing a conditional move instruction allows the computingdevice 100 to set the proper value of the source pointer withoutexecuting any unpredictable branch instructions and incurring associatedbranch misprediction penalties. After determining the source pointer,the computing device 100 may copy the data length amount of bytes fromthe source pointer to the output stream pointer, as further describedbelow. The actual data is thus either copied from the literal data ofthe input stream 212 or from data previously output to the output stream214.

In block 332, the computing device 100 determines whether the offsetvalue is less than 16 and whether the class of the symbol is notliteral. In some embodiments, the computing device 100 may performshort-circuit logical evaluation of that test. That is, if the offsetvalue is greater than or equal to 16, the computing device 100 may nottest the class of the symbol. In many embodiments, the offset value isusually greater than or equal to 16. Thus, the determination of block332 may be highly predictable using branch prediction hardware of thecomputing device 100. If the offset value is less than 16 and the classis not literal, the method 332 branches to block 336, described below.If not, the method 300 branches to block 334.

In block 334, the computing device 100 performs a 16-byte block memorycopy of the data length number of bytes starting at the source pointer,to the output stream pointer. In many embodiments, the processor 120 ofthe computing device 100 may be capable of fast 16-byte unaligned memorycopies, for example using specialized vector instructions or vectorregisters. Because the computing device 100 copies 16 bytes of data at atime, more data than the requested data length may be copied to theoutput stream 214. However, as described below, after completing thecopy, that incorrectly copied data will be positioned past the outputstream pointer and thus will be overwritten with correct data whenadditional symbols are decoded. After copying the data, the method 300advances to block 338, described below.

Referring back to block 332, if the offset value is less than 16 and theclass is not literal, the method 332 branches to block 336. In block336, the computing device 100 performs a byte-by-byte memory copy ofdata length bytes from the source pointer to the output stream pointer.Performing a byte-by-byte copy may typically be slower than a blockcopy, but the byte-by-byte copy may be required for correctness and/orto avoid page faults or other errors. After copying the data, the method300 advances to block 338.

In block 338, the computing device 100 increments the output streampointer by the length of the data copied to the output stream 214. Thus,the output stream pointer is prepared to write decompressed data for thenext symbol at the correct position in the output stream 214. In block340, the computing device 100 increments the input stream pointer by thenext symbol increment value. Thus, the input stream pointer is preparedto read data for the next input symbol at the correct position in theinput stream 212.

In block 342, the computing device 100 determines whether additionalsymbols remain to be decoded. The computing device 100 may use anytechnique to determine whether additional symbols remain, such ascomparing the output stream pointer or the input stream pointer tomaximum sizes previously determined based on a header of the compressedfile, testing whether an end of file has been reached in the inputstream 212, or any other test. If additional symbols remain, the method300 loops back to block 306 to read the next tag byte, shown in FIG. 3A.If no additional symbols remain, the method 300 is completed. Thecomputing device 100 may close the input stream 212 and/or the outputstream 214, output the decompressed content, or perform any otherrequired processing to produce the decompressed output. The method 300may be restarted at block 302 to perform additional decompression.

Referring now to FIG. 5, the pseudocode 500 illustrates one potentialembodiment of the method 300. As shown, the pseudocode 500 illustratesoperations 306′ through 340′, which each correspond to one illustrativeembodiment of blocks 306 through 340 of FIGS. 3A and 3B, respectively.In particular, the pseudocode 500 determines whether to apply theslow-path routine in operations 308′ through 314′ by performing highlypredictable “if” statements that, if true, may jump to the slow-pathroutine, corresponding to one illustrative embodiment of the blocks 308through 314. Additionally, the pseudocode 500 conditionally assigns thevalue of the source pointer in the operation 330′, which includes atertiary operator and corresponds to one illustrative embodiment ofblock 330. As described above, in many embodiments that tertiaryoperator may be compiled to executable code including a conditional moveinstruction such as CMOV. The pseudocode 500 also uses a similaroperation 326′ including a tertiary operator to conditionally move thesecond byte of the offset value, corresponding to one illustrativeembodiment of the block 326. Referring now to FIG. 6, the pseudocode 600illustrates operations 326″, 328″, which each correspond to anotherillustrative embodiment of the blocks 326, 328 of the method 300,respectively. As shown, the pseudocode 600 also uses a tertiary operatorto calculate the offset value, which may compiled to executable codeincluding a conditional move instruction such as CMOV.

EXAMPLES

Illustrative examples of the technologies disclosed herein are providedbelow. An embodiment of the technologies may include any one or more,and any combination of, the examples described below.

Example 1 includes a computing device for data decompression, thecomputing device comprising an input module to read a symbol tag valuefrom a memory location identified by an input pointer; a symbol tagdecoding module to determine a next symbol increment value, a literalincrement value, a data length, and an offset value as a function of thesymbol tag value; a data source module to conditionally set a sourcepointer to (i) the input pointer plus the literal increment value inresponse to a determination that the symbol tag value includes a literalclass value and (ii) to an output pointer minus the offset value inresponse to a determination that the symbol tag value does not includethe literal class value; and an output module to copy data of the datalength from a memory location identified by the source pointer to amemory location identified by the output pointer; wherein the inputmodule is further to increment the input pointer by the next symbolincrement value in response to copying of the data.

Example 2 includes the subject matter of Example 1, and wherein toconditionally set the source pointer comprises to conditionally set thesource pointer without execution of a branch instruction.

Example 3 includes the subject matter of any of Examples 1 and 2, andwherein to conditionally set the source pointer comprises toconditionally set the source pointer with a conditional moveinstruction.

Example 4 includes the subject matter of any of Examples 1-3, andfurther including a slow-path module to execute a slow-pathdecompression routine in response to a determination that a currentsymbol cannot be fast-path decoded; wherein the symbol tag decodingmodule is further to determine, as a function of the symbol tag valueand prior to a determination of the next symbol increment value, whetherthe current symbol can be fast-path decoded.

Example 5 includes the subject matter of any of Examples 1-4, andwherein to determine whether the current symbol can be fast-path decodedcomprises to determine whether the data length has a predefinedrelationship with a maximum data length as a function of the symbol tag.

Example 6 includes the subject matter of any of Examples 1-5, andwherein to determine whether the data length has the predefinedrelationship with the maximum data length comprises to determine whetherthe data length is greater than sixty bytes.

Example 7 includes the subject matter of any of Examples 1-6, andwherein to determine whether the current symbol can be fast-path decodedcomprises to determine whether the symbol tag includes a four-byteoffset class value.

Example 8 includes the subject matter of any of Examples 1-7, andwherein to determine the next symbol increment value comprises to indexa next symbol increment table with the symbol tag value.

Example 9 includes the subject matter of any of Examples 1-8, andwherein to index the next symbol increment table comprises to look upthe next symbol increment value in the next symbol increment tablestored in a cache memory of a processor of the computing device.

Example 10 includes the subject matter of any of Examples 1-9, and whereto determine the literal increment value comprises to index a literalincrement table with the symbol tag value.

Example 11 includes the subject matter of any of Examples 1-10, andwherein to index the literal increment table comprises to look up theliteral increment value in the literal increment table stored in a cachememory of a processor of the computing device.

Example 12 includes the subject matter of any of Examples 1-11, andwherein to determine the data length comprises to index a length tablewith the symbol tag value.

Example 13 includes the subject matter of any of Examples 1-12, andwherein to index the length table comprises to look up the data lengthin the length table stored in a cache memory of a processor of thecomputing device.

Example 14 includes the subject matter of any of Examples 1-13, andwherein to read the symbol tag value comprises to read a tag byte fromthe memory location identified by the input pointer.

Example 15 includes the subject matter of any of Examples 1-14, andwherein to determine the offset value as a function of the symbol tagvalue comprises to right-shift the tag byte by five bits; conditionallyset the tag byte to a second offset byte read from the memory locationidentified by the input pointer if the symbol tag value includes atwo-byte offset class value and to the tag byte if the symbol tag valuedoes not include the two-byte offset class value; and concatenate thetag byte and a first offset byte read from the memory locationidentified by the input pointer to generate the offset value.

Example 16 includes the subject matter of any of Examples 1-15, andwherein to conditionally set the tag byte comprises to conditionally setthe tag byte using a conditional move instruction.

Example 17 includes the subject matter of any of Examples 1-16, andwherein to copy data of the data length from the memory locationidentified by the source pointer to the memory location identified bythe output pointer comprises to determine whether the offset value isless than a predefined block size; determine whether the symbol tagvalue does not include a literal class value in response to adetermination that the offset value is less than the predefined blocksize; perform a byte-by-byte memory copy of the data length from thememory location identified by the source pointer to the memory locationidentified by the output pointer in response to a determination that thesymbol tag value does not include the literal class value; and perform ablock memory copy of the data length using blocks of the predefinedblock size from the memory location identified by the source pointer tothe memory location identified by the output pointer in response to adetermination that the offset value is not less than the predefinedblock size or in response to a determination that the symbol tag valueincludes the literal class value.

Example 18 includes the subject matter of any of Examples 1-17, andwherein the predefined block size comprises sixteen bytes, thirty-twobytes, or sixty-four bytes.

Example 19 includes the subject matter of any of Examples 1-18, andwherein the output module is further to increment the output pointer bythe data length in response to copying of the data.

Example 20 includes the subject matter of any of Examples 1-19, andwherein the input module is further to determine whether additionalsymbols remain in response to incrementing of the input pointer; andread a next symbol tag value from a memory location identified by theinput pointer in response to a determination that additional symbolsremain.

Example 21 includes a method for data decompression, the methodcomprising reading, by a computing device, a symbol tag value from amemory location identified by an input pointer; determining, by thecomputing device, a next symbol increment value, a literal incrementvalue, a data length, and an offset value as a function of the symboltag value; conditionally setting, by the computing device, a sourcepointer to (i) the input pointer plus the literal increment value inresponse to determining that the symbol tag value includes a literalclass value and (ii) to an output pointer minus the offset value inresponse to determining that the symbol tag value does not include theliteral class value; copying, by the computing device, data of the datalength from a memory location identified by the source pointer to amemory location identified by the output pointer; and incrementing, bythe computing device, the input pointer by the next symbol incrementvalue in response to copying the data.

Example 22 includes the subject matter of Example 21, and whereinconditionally setting the source pointer comprises conditionally settingthe source pointer without executing a branch instruction.

Example 23 includes the subject matter of any of Examples 21 and 22, andwherein conditionally setting the source pointer comprises conditionallysetting the source pointer using a conditional move instruction.

Example 24 includes the subject matter of any of Examples 21-23, andfurther including determining, by the computing device as a function ofthe symbol tag value and prior to determining the next symbol incrementvalue, whether a current symbol can be fast-path decoded; and executing,by the computing device, a slow-path decompression routine in responseto determining the current symbol cannot be fast-path decoded.

Example 25 includes the subject matter of any of Examples 21-24, andwherein determining whether the current symbol can be fast-path decodedcomprises determining whether the data length has a predefinedrelationship with a maximum data length as a function of the symbol tag.

Example 26 includes the subject matter of any of Examples 21-25, andwherein determining whether the data length has the predefinedrelationship with the maximum data length comprises determining whetherthe data length is greater than sixty bytes.

Example 27 includes the subject matter of any of Examples 21-26, andwherein determining whether the current symbol can be fast-path decodedcomprises determining whether the symbol tag includes a four-byte offsetclass value.

Example 28 includes the subject matter of any of Examples 21-27, andwherein determining the next symbol increment value comprises indexing anext symbol increment table with the symbol tag value.

Example 29 includes the subject matter of any of Examples 21-28, andwherein indexing the next symbol increment table comprises looking upthe next symbol increment value in the next symbol increment tablestored in a cache memory of a processor of the computing device.

Example 30 includes the subject matter of any of Examples 21-29, andwherein determining the literal increment value comprises indexing aliteral increment table with the symbol tag value.

Example 31 includes the subject matter of any of Examples 21-30, andwherein indexing the literal increment table comprises looking up theliteral increment value in the literal increment table stored in a cachememory of a processor of the computing device.

Example 32 includes the subject matter of any of Examples 21-31, andwherein determining the data length comprises indexing a length tablewith the symbol tag value.

Example 33 includes the subject matter of any of Examples 21-32, andwherein indexing the length table comprises looking up the data lengthin the length table stored in a cache memory of a processor of thecomputing device.

Example 34 includes the subject matter of any of Examples 21-33, andwherein reading the symbol tag value comprises reading a tag byte fromthe memory location identified by the input pointer.

Example 35 includes the subject matter of any of Examples 21-34, andwherein determining the offset value as a function of the symbol tagvalue comprises right-shifting the tag byte by five bits; conditionallysetting the tag byte to a second offset byte read from the memorylocation identified by the input pointer if the symbol tag valueincludes a two-byte offset class value and to the tag byte if the symboltag value does not include the two-byte offset class value; andconcatenating the tag byte and a first offset byte read from the memorylocation identified by the input pointer to generate the offset value.

Example 36 includes the subject matter of any of Examples 21-35, andwherein conditionally setting the tag byte comprises conditionallysetting the tag byte using a conditional move instruction.

Example 37 includes the subject matter of any of Examples 21-36, andwherein copying data of the data length from the memory locationidentified by the source pointer to the memory location identified bythe output pointer comprises determining whether the offset value isless than a predefined block size; determining whether the symbol tagvalue does not include a literal class value in response to determiningthe offset value is less than the predefined block size; performing abyte-by-byte memory copy of the data length from the memory locationidentified by the source pointer to the memory location identified bythe output pointer in response to determining the symbol tag value doesnot include the literal class value; and performing a block memory copyof the data length using blocks of the predefined block size from thememory location identified by the source pointer to the memory locationidentified by the output pointer in response to determining the offsetvalue is not less than the predefined block size or in response todetermining the symbol tag value includes the literal class value.

Example 38 includes the subject matter of any of Examples 21-37, andwherein the predefined block size comprises sixteen bytes, thirty-twobytes, or sixty-four bytes.

Example 39 includes the subject matter of any of Examples 21-38, andfurther including incrementing, by the computing device, the outputpointer by the data length in response to copying the data.

Example 40 includes the subject matter of any of Examples 21-39, andfurther including determining, by the computing device, whetheradditional symbols remain in response to incrementing the input pointer;and reading, by the computing device, a next symbol tag value from amemory location identified by the input pointer in response todetermining additional symbols remain.

Example 41 includes a computing device comprising a processor; and amemory having stored therein a plurality of instructions that whenexecuted by the processor cause the computing device to perform themethod of any of Examples 21-40.

Example 42 includes one or more machine readable storage mediacomprising a plurality of instructions stored thereon that in responseto being executed result in a computing device performing the method ofany of Examples 21-40.

Example 43 includes a computing device comprising means for performingthe method of any of Examples 21-40.

Example 44 includes a computing device for data decompression, thecomputing device comprising means for reading a symbol tag value from amemory location identified by an input pointer; means for determining anext symbol increment value, a literal increment value, a data length,and an offset value as a function of the symbol tag value; means forconditionally setting a source pointer to (i) the input pointer plus theliteral increment value in response to determining that the symbol tagvalue includes a literal class value and (ii) to an output pointer minusthe offset value in response to determining that the symbol tag valuedoes not include the literal class value; means for copying data of thedata length from a memory location identified by the source pointer to amemory location identified by the output pointer; and means forincrementing the input pointer by the next symbol increment value inresponse to copying the data.

Example 45 includes the subject matter of Example 44, and wherein themeans for conditionally setting the source pointer comprises means forconditionally setting the source pointer without executing a branchinstruction.

Example 46 includes the subject matter of any of Examples 44 and 45, andwherein the means for conditionally setting the source pointer comprisesmeans for conditionally setting the source pointer using a conditionalmove instruction.

Example 47 includes the subject matter of any of Examples 44-46, andfurther including means for determining, as a function of the symbol tagvalue and prior to determining the next symbol increment value, whethera current symbol can be fast-path decoded; and means for executing aslow-path decompression routine in response to determining the currentsymbol cannot be fast-path decoded.

Example 48 includes the subject matter of any of Examples 44-47, andwherein the means for determining whether the current symbol can befast-path decoded comprises means for determining whether the datalength has a predefined relationship with a maximum data length as afunction of the symbol tag.

Example 49 includes the subject matter of any of Examples 44-48, andwherein the means for determining whether the data length has thepredefined relationship with the maximum data length comprises means fordetermining whether the data length is greater than sixty bytes.

Example 50 includes the subject matter of any of Examples 44-49, and,wherein the means for determining whether the current symbol can befast-path decoded comprises means for determining whether the symbol tagincludes a four-byte offset class value.

Example 51 includes the subject matter of any of Examples 44-50, andwherein the means for determining the next symbol increment valuecomprises means for indexing a next symbol increment table with thesymbol tag value.

Example 52 includes the subject matter of any of Examples 44-51, andwherein the means for indexing the next symbol increment table comprisesmeans for looking up the next symbol increment value in the next symbolincrement table stored in a cache memory of a processor of the computingdevice.

Example 53 includes the subject matter of any of Examples 44-52, andwherein the means for determining the literal increment value comprisesmeans for indexing a literal increment table with the symbol tag value.

Example 54 includes the subject matter of any of Examples 44-53, andwherein the means for indexing the literal increment table comprisesmeans for looking up the literal increment value in the literalincrement table stored in a cache memory of a processor of the computingdevice.

Example 55 includes the subject matter of any of Examples 44-54, andwherein the means for determining the data length comprises means forindexing a length table with the symbol tag value.

Example 56 includes the subject matter of any of Examples 44-55, andwherein the means for indexing the length table comprises means forlooking up the data length in the length table stored in a cache memoryof a processor of the computing device.

Example 57 includes the subject matter of any of Examples 44-56, andwherein the means for reading the symbol tag value comprises means forreading a tag byte from the memory location identified by the inputpointer.

Example 58 includes the subject matter of any of Examples 44-57, andwherein the means for determining the offset value as a function of thesymbol tag value comprises means for right-shifting the tag byte by fivebits; means for conditionally setting the tag byte to a second offsetbyte read from the memory location identified by the input pointer ifthe symbol tag value includes a two-byte offset class value and to thetag byte if the symbol tag value does not include the two-byte offsetclass value; and means for concatenating the tag byte and a first offsetbyte read from the memory location identified by the input pointer togenerate the offset value.

Example 59 includes the subject matter of any of Examples 44-58, andwherein the means for conditionally setting the tag byte comprises meansfor conditionally setting the tag byte using a conditional moveinstruction.

Example 60 includes the subject matter of any of Examples 44-59, andwherein the means for copying data of the data length from the memorylocation identified by the source pointer to the memory locationidentified by the output pointer comprises means for determining whetherthe offset value is less than a predefined block size; means fordetermining whether the symbol tag value does not include a literalclass value in response to determining the offset value is less than thepredefined block size; means for performing a byte-by-byte memory copyof the data length from the memory location identified by the sourcepointer to the memory location identified by the output pointer inresponse to determining the symbol tag value does not include theliteral class value; and means for performing a block memory copy of thedata length using blocks of the predefined block size from the memorylocation identified by the source pointer to the memory locationidentified by the output pointer in response to determining the offsetvalue is not less than the predefined block size or in response todetermining the symbol tag value includes the literal class value.

Example 61 includes the subject matter of any of Examples 44-60, andwherein the predefined block size comprises sixteen bytes, thirty-twobytes, or sixty-four bytes.

Example 62 includes the subject matter of any of Examples 44-61, andfurther including means for incrementing the output pointer by the datalength in response to copying the data.

Example 63 includes the subject matter of any of Examples 44-62, andfurther including means for determining whether additional symbolsremain in response to incrementing the input pointer; and means forreading a next symbol tag value from a memory location identified by theinput pointer in response to determining additional symbols remain.

1. A computing device for data decompression, the computing devicecomprising: an input module to read a symbol tag value from a memorylocation identified by an input pointer; a symbol tag decoding module todetermine a next symbol increment value, a literal increment value, adata length, and an offset value as a function of the symbol tag value;a data source module to conditionally set a source pointer to (i) theinput pointer plus the literal increment value in response to adetermination that the symbol tag value includes a literal class valueand (ii) to an output pointer minus the offset value in response to adetermination that the symbol tag value does not include the literalclass value; and an output module to copy data of the data length from amemory location identified by the source pointer to a memory locationidentified by the output pointer; wherein the input module is further toincrement the input pointer by the next symbol increment value inresponse to copying of the data.
 2. The computing device of claim 1,wherein to conditionally set the source pointer comprises toconditionally set the source pointer without execution of a branchinstruction.
 3. The computing device of claim 2, wherein toconditionally set the source pointer comprises to conditionally set thesource pointer with a conditional move instruction.
 4. The computingdevice of claim 1, further comprising: a slow-path module to execute aslow-path decompression routine in response to a determination that acurrent symbol cannot be fast-path decoded; wherein the symbol tagdecoding module is further to determine, as a function of the symbol tagvalue and prior to a determination of the next symbol increment value,whether the current symbol can be fast-path decoded.
 5. The computingdevice of claim 4, wherein to determine whether the current symbol canbe fast-path decoded comprises to: determine whether the data length hasa predefined relationship with a maximum data length as a function ofthe symbol tag, wherein to determine whether the data length has thepredefined relationship with the maximum data length comprises todetermine whether the data length is greater than sixty bytes; anddetermine whether the symbol tag includes a four-byte offset classvalue.
 6. The computing device of claim 1, wherein to determine the nextsymbol increment value comprises to index a next symbol increment tablewith the symbol tag value, wherein to index the next symbol incrementtable comprises to look up the next symbol increment value in the nextsymbol increment table stored in a cache memory of a processor of thecomputing device.
 7. The computing device of claim 1, where to determinethe literal increment value comprises to index a literal increment tablewith the symbol tag value, wherein to index the literal increment tablecomprises to look up the literal increment value in the literalincrement table stored in a cache memory of a processor of the computingdevice.
 8. The computing device of claim 1, wherein to determine thedata length comprises to index a length table with the symbol tag value,wherein to index the length table comprises to look up the data lengthin the length table stored in a cache memory of a processor of thecomputing device.
 9. The computing device of claim 1, wherein: to readthe symbol tag value comprises to read a tag byte from the memorylocation identified by the input pointer; and to determine the offsetvalue as a function of the symbol tag value comprises to: right-shiftthe tag byte by five bits; conditionally set the tag byte to a secondoffset byte read from the memory location identified by the inputpointer if the symbol tag value includes a two-byte offset class valueand to the tag byte if the symbol tag value does not include thetwo-byte offset class value; and concatenate the tag byte and a firstoffset byte read from the memory location identified by the inputpointer to generate the offset value.
 10. The computing device of claim9, wherein to conditionally set the tag byte comprises to conditionallyset the tag byte using a conditional move instruction.
 11. The computingdevice of claim 1, wherein to copy data of the data length from thememory location identified by the source pointer to the memory locationidentified by the output pointer comprises to: determine whether theoffset value is less than sixteen; determine whether the symbol tagvalue does not include a literal class value in response to adetermination that the offset value is less than sixteen; perform abyte-by-byte memory copy of the data length from the memory locationidentified by the source pointer to the memory location identified bythe output pointer in response to a determination that the symbol tagvalue does not include the literal class value; and perform asixteen-byte block memory copy of the data length from the memorylocation identified by the source pointer to the memory locationidentified by the output pointer in response to a determination that theoffset value is not less than sixteen or in response to a determinationthat the symbol tag value includes the literal class value.
 12. A methodfor data decompression, the method comprising: reading, by a computingdevice, a symbol tag value from a memory location identified by an inputpointer; determining, by the computing device, a next symbol incrementvalue, a literal increment value, a data length, and an offset value asa function of the symbol tag value; conditionally setting, by thecomputing device, a source pointer to (i) the input pointer plus theliteral increment value in response to determining that the symbol tagvalue includes a literal class value and (ii) to an output pointer minusthe offset value in response to determining that the symbol tag valuedoes not include the literal class value; copying, by the computingdevice, data of the data length from a memory location identified by thesource pointer to a memory location identified by the output pointer;and incrementing, by the computing device, the input pointer by the nextsymbol increment value in response to copying the data.
 13. The methodof claim 12, wherein conditionally setting the source pointer comprisesconditionally setting the source pointer without executing a branchinstruction.
 14. The method of claim 13, wherein conditionally settingthe source pointer comprises conditionally setting the source pointerusing a conditional move instruction.
 15. The method of claim 12,further comprising: determining, by the computing device as a functionof the symbol tag value and prior to determining the next symbolincrement value, whether a current symbol can be fast-path decoded; andexecuting, by the computing device, a slow-path decompression routine inresponse to determining the current symbol cannot be fast-path decoded.16. The method of claim 12, wherein: reading the symbol tag valuecomprises reading a tag byte from the memory location identified by theinput pointer; and determining the offset value as a function of thesymbol tag value comprises: right-shifting the tag byte by five bits;conditionally setting the tag byte to a second offset byte read from thememory location identified by the input pointer if the symbol tag valueincludes a two-byte offset class value and to the tag byte if the symboltag value does not include the two-byte offset class value; andconcatenating the tag byte and a first offset byte read from the memorylocation identified by the input pointer to generate the offset value.17. The method of claim 16, wherein conditionally setting the tag bytecomprises conditionally setting the tag byte using a conditional moveinstruction.
 18. The method of claim 12, wherein copying data of thedata length from the memory location identified by the source pointer tothe memory location identified by the output pointer comprises:determining whether the offset value is less than sixteen; determiningwhether the symbol tag value does not include a literal class value inresponse to determining the offset value is less than sixteen;performing a byte-by-byte memory copy of the data length from the memorylocation identified by the source pointer to the memory locationidentified by the output pointer in response to determining the symboltag value does not include the literal class value; and performing asixteen-byte block memory copy of the data length from the memorylocation identified by the source pointer to the memory locationidentified by the output pointer in response to determining the offsetvalue is not less than sixteen or in response to determining the symboltag value includes the literal class value.
 19. One or morecomputer-readable storage media comprising a plurality of instructionsthat in response to being executed cause a computing device to: read asymbol tag value from a memory location identified by an input pointer;determine a next symbol increment value, a literal increment value, adata length, and an offset value as a function of the symbol tag value;conditionally set a source pointer to (i) the input pointer plus theliteral increment value in response to determining that the symbol tagvalue includes a literal class value and (ii) to an output pointer minusthe offset value in response to determining that the symbol tag valuedoes not include the literal class value; copy data of the data lengthfrom a memory location identified by the source pointer to a memorylocation identified by the output pointer; and increment the inputpointer by the next symbol increment value in response to copying thedata.
 20. The one or more computer-readable storage media of claim 19,wherein to conditionally set the source pointer comprises toconditionally set the source pointer without executing a branchinstruction.